AITopics | distance measure

Collaborating Authors

distance measure

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

5545d9bcefb7d03d5ad39a905d14fbe3-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 01:46:27 GMT

adversarial example, boundary sample, experiment, (14 more...)

Neural Information Processing Systems

Industry: Information Technology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.72)

Add feedback

CROCS: A Two-Stage Clustering Framework for Behaviour-Centric Consumer Segmentation with Smart Meter Data

Yerbury, Luke W., Campello, Ricardo J. G. B., Livingston, G. C. Jr, Goldsworthy, Mark, O'Neil, Lachlan

arXiv.org Machine LearningJan-16-2026

With grid operators confronting rising uncertainty from renewable integration and a broader push toward electrification, Demand-Side Management (DSM) -- particularly Demand Response (DR) -- has attracted significant attention as a cost-effective mechanism for balancing modern electricity systems. Unprecedented volumes of consumption data from a continuing global deployment of smart meters enable consumer segmentation based on real usage behaviours, promising to inform the design of more effective DSM and DR programs. However, existing clustering-based segmentation methods insufficiently reflect the behavioural diversity of consumers, often relying on rigid temporal alignment, and faltering in the presence of anomalies, missing data, or large-scale deployments. To address these challenges, we propose a novel two-stage clustering framework -- Clustered Representations Optimising Consumer Segmentation (CROCS). In the first stage, each consumer's daily load profiles are clustered independently to form a Representative Load Set (RLS), providing a compact summary of their typical diurnal consumption behaviours. In the second stage, consumers are clustered using the Weighted Sum of Minimum Distances (WSMD), a novel set-to-set measure that compares RLSs by accounting for both the prevalence and similarity of those behaviours. Finally, community detection on the WSMD-induced graph reveals higher-order prototypes that embody the shared diurnal behaviours defining consumer groups, enhancing the interpretability of the resulting clusters. Extensive experiments on both synthetic and real Australian smart meter datasets demonstrate that CROCS captures intra-consumer variability, uncovers both synchronous and asynchronous behavioural similarities, and remains robust to anomalies and missing data, while scaling efficiently through natural parallelisation. These results...

data mining, machine learning, pattern recognition, (20 more...)

arXiv.org Machine Learning

2601.10494

Country:

Europe (0.46)
North America > United States (0.45)
Oceania > Australia (0.28)
Asia > China (0.28)

Genre: Research Report > New Finding (0.92)

Industry:

Energy > Renewable (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Faster Differentially Private Samplers via Rényi Divergence Analysis of Discretized Langevin MCMC

Neural Information Processing SystemsDec-24-2025, 01:08:45 GMT

Various differentially private algorithms instantiate the exponential mechanism, and require sampling from the distribution $\exp(-f)$ for a suitable function $f$. When the domain of the distribution is high-dimensional, this sampling can be challenging. Using heuristic sampling schemes such as Gibbs sampling does not necessarily lead to provable privacy. When $f$ is convex, techniques from log-concave sampling lead to polynomial-time algorithms, albeit with large polynomials. Langevin dynamics-based algorithms offer much faster alternatives under some distance measures such as statistical distance. In this work, we establish rapid convergence for these algorithms under distance measures more suitable for differential privacy. For smooth, strongly-convex $f$, we give the first results proving convergence in R\'enyi divergence. This gives us fast differentially private algorithms for such $f$. Our techniques and simple and generic and apply also to underdamped Langevin dynamics.

discretized langevin mcmc, faster differentially private sampler, nyi divergence analysis, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

SPROCKET: Extending ROCKET to Distance-Based Time-Series Transformations With Prototypes

Harner, Nicholas

arXiv.org Artificial IntelligenceDec-10-2025

Classical Time Series Classification algorithms are dominated by feature engineering strategies. One of the most prominent of these transforms is ROCKET, which achieves strong performance through random kernel features. We introduce SPROCKET (Selected Prototype Random Convolutional Kernel Transform), which implements a new feature engineering strategy based on prototypes. On a majority of the UCR and UEA Time Series Classification archives, SPROCKET achieves performance comparable to existing convolutional algorithms and the new MR-HY-SP ( MultiROCKET-HYDRA-SPROCKET) ensemble's average accuracy ranking exceeds HYDRA-MR, the previous best convolutional ensemble's performance. These experimental results demonstrate that prototype-based feature transformation can enhance both accuracy and robustness in time series classification.

data mining, machine learning, sprocket, (14 more...)

arXiv.org Artificial Intelligence

2512.08246

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

A novel k-means clustering approach using two distance measures for Gaussian data

Gada, Naitik

arXiv.org Machine LearningNov-25-2025

Clustering algorithms have long been the topic of research, representing the more popular side of unsupervised learning. Since clustering analysis is one of the best ways to find some clarity and structure within raw data, this paper explores a novel approach to \textit{k}-means clustering. Here we present a \textit{k}-means clustering algorithm that takes both the within cluster distance (WCD) and the inter cluster distance (ICD) as the distance metric to cluster the data into \emph{k} clusters pre-determined by the Calinski-Harabasz criterion in order to provide a more robust output for the clustering analysis. The idea with this approach is that by including both the measurement metrics, the convergence of the data into their clusters becomes solidified and more robust. We run the algorithm with some synthetically produced data and also some benchmark data sets obtained from the UCI repository. The results show that the convergence of the data into their respective clusters is more accurate by using both WCD and ICD measurement metrics. The algorithm is also better at clustering the outliers into their true clusters as opposed to the traditional \textit{k} means method. We also address some interesting possible research topics that reveal themselves as we answer the questions we initially set out to address.

algorithm, cluster center, variance, (13 more...)

arXiv.org Machine Learning

2511.17823

Country:

North America > United States > Wisconsin (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Newton-Flow Particle Filters based on Generalized Cramér Distance

Hanebeck, Uwe D.

arXiv.org Artificial IntelligenceNov-25-2025

We propose a recursive particle filter for high-dimensional problems that inherently never degenerates. The state estimate is represented by deterministic low-discrepancy particle sets. We focus on the measurement update step, where a likelihood function is used for representing the measurement and its uncertainty. This likelihood is progressively introduced into the filtering procedure by homotopy continuation over an artificial time. A generalized Cramér distance between particle sets is derived in closed form that is differentiable and invariant to particle order. A Newton flow then continually minimizes this distance over artificial time and thus smoothly moves particles from prior to posterior density. The new filter is surprisingly simple to implement and very efficient. It just requires a prior particle set and a likelihood function, never estimates densities from samples, and can be used as a plugin replacement for classic approaches.

artificial intelligence, machine learning, particle, (19 more...)

arXiv.org Artificial Intelligence

2509.00182

Country:

North America > United States (1.00)
Asia (0.68)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Rethinking Metrics and Diffusion Architecture for 3D Point Cloud Generation

Bastico, Matteo, Ryckelynck, David, Corté, Laurent, Tillier, Yannick, Decencière, Etienne

arXiv.org Artificial IntelligenceNov-11-2025

As 3D point clouds become a cornerstone of modern technology, the need for sophisticated generative models and reliable evaluation metrics has grown exponentially. In this work, we first expose that some commonly used metrics for evaluating generated point clouds, particularly those based on Chamfer Distance (CD), lack robustness against defects and fail to capture geometric fidelity and local shape consistency when used as quality indicators. W e further show that introducing samples alignment prior to distance calculation and replacing CD with Density-Aware Chamfer Distance (DCD) are simple yet essential steps to ensure the consistency and robustness of point cloud generative model evaluation metrics. While existing metrics primarily focus on directly comparing 3D Euclidean coordinates, we present a novel metric, named Surface Normal Concordance (SNC), which approximates surface similarity by comparing estimated point normals. This new metric, when combined with traditional ones, provides a more comprehensive evaluation of the quality of generated samples. Finally, leveraging recent advancements in transformer-based models for point cloud analysis, such as serialized patch attention, we propose a new architecture for generating high-fidelity 3D structures, the Diffusion Point Transformer . W e perform extensive experiments and comparisons on the ShapeNet dataset, showing that our model outperforms previous solutions, particularly in terms of quality of generated point clouds, achieving new state-of-the-art.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2511.05308

Country:

Europe (1.00)
North America > United States (0.68)

Genre: Research Report (1.00)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)

Add feedback

Effectiveness of High-Dimensional Distance Metrics on Solar Flare Time Series

Rohlfing, Elaina, Ahmadzadeh, Azim, Aparna, V

arXiv.org Artificial IntelligenceNov-5-2025

Solar-flare forecasting has been extensively researched yet remains an open problem. In this paper, we investigate the contributions of elastic distance measures for detecting patterns in the solar-flare dataset, SWAN-SF. We employ a simple $k$-medoids clustering algorithm to evaluate the effectiveness of advanced, high-dimensional distance metrics. Our results show that, despite thorough optimization, none of the elastic distances outperform Euclidean distance by a significant margin. We demonstrate that, although elastic measures have shown promise for univariate time series, when applied to the multivariate time series of SWAN-SF, characterized by the high stochasticity of solar activity, they effectively collapse to Euclidean distance. We conduct thousands of experiments and present both quantitative and qualitative evidence supporting this finding.

data mining, distance measure, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.01873

Country: North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Industry:

Energy > Power Industry (0.46)
Energy > Renewable (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Beyond Point Matching: Evaluating Multiscale Dubuc Distance for Time Series Similarity

Ahmadzadeh, Azim, Khazaei, Mahsa, Rohlfing, Elaina

arXiv.org Artificial IntelligenceOct-28-2025

Abstract--Time series are high-dimensional and complex data objects, making their efficient search and indexing a longstanding challenge in data mining. Building on a recently introduced similarity measure, namely Multiscale Dubuc Distance (MDD), this paper investigates its comparative strengths and limitations relative to the widely used Dynamic Time Warping (DTW). MDD is novel in two key ways: it evaluates time series similarity across multiple temporal scales and avoids point-to-point alignment. We demonstrate that in many scenarios where MDD outperforms DTW, the gains are substantial, and we provide a detailed analysis of the specific performance gaps it addresses. We provide simulations, in addition to the 95 datasets from the UCR archive, to test our hypotheses. Finally, we apply both methods to a challenging real-world classification task and show that MDD yields a significant improvement over DTW, underscoring its practical utility. Time series, or more generally, ordered high-dimensional data types, have become increasingly prevalent with the rise of powerful computational tools and machine learning techniques. In this study, we adopt the term time series as an umbrella label for all such sequential data. A central challenge in analyzing time series lies in defining and measuring similarity. Similarity is inherently subjective, shaped by the specific goals and nuances of a given application. The existing literature has produced a rich landscape of similarity measures, each tailored to specific assumptions and use cases.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.21824

Country: North America > United States > Missouri (0.28)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

On pattern classification with weighted dimensions

Mollah, Ayatullah Faruk

arXiv.org Artificial IntelligenceOct-24-2025

Studies on various facets of pattern classification is often imperative while working with multi - dimensional samples pertaining to diverse application scenarios. In this notion, w eighted dimension - based distance measure has been one of the vital considerat ions in pattern analysis as it reflects the degree of similarity between samples . Though it is often presumed to be settled with the pervasive use of Euclidean distance, plethora of issues often surface. In this paper, we present (a) a detail analysis on t he impact of distance measure norms and weights of dimensions along with visualization, (b) a novel weighting scheme for each dimension, (c) incorporation of this dimensional weighting schema in to a KNN classifier, and (d) pattern classification on a varie ty of synthetic as well as realistic datasets with the developed model . It has perform ed well across diverse experiments in comparison to the traditional KNN under the same experimental setups. Specifically, for gene expression datasets, it yields signific ant and consistent gain in classification accuracy (around 10%) in all cross - validation experiments with different values of k. As such datasets contain limited number of samples of high dimensions, meaningful selection of nearest neighbours is desirable, and this requirement is reasonably met by regulat ing the shape and size of the region enclos ing the k number of reference samples with the developed weighting schema and appropriate norm . I t, therefore, stands as an important generalization of K NN classifier powered by weighted Minkowski distance with the present weighting schema .

artificial intelligence, dimension, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.20107

Country: Asia > India (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.96)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.49)

Add feedback